---
title: "InMemoryBM25Retriever"
id: inmemorybm25retriever
slug: "/inmemorybm25retriever"
description: "A keyword-based Retriever compatible with InMemoryDocumentStore."
---

# InMemoryBM25Retriever

A keyword-based Retriever compatible with InMemoryDocumentStore.

|  |  |
| --- | --- |
| **Most common position in a pipeline** | In query pipelines:   In a RAG pipeline, before a [`PromptBuilder`](../builders/promptbuilder.mdx)   In a semantic search pipeline, as the last component   In an extractive QA pipeline, before an [`ExtractiveReader`](../readers/extractivereader.mdx)  |
| **Mandatory init variables** | "document_store": An instance of [InMemoryDocumentStore](../../document-stores/inmemorydocumentstore.mdx) |
| **Mandatory run variables** | "query": A query string |
| **Output variables** | "documents": A list of documents (matching the query) |
| **API reference** | [Retrievers](/reference/retrievers-api) |
| **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/in_memory/bm25_retriever.py |

## Overview

`InMemoryBM25Retriever` is a keyword-based Retriever that fetches Documents matching a query from a temporary in-memory database. It determines the similarity between Documents and the query based on the BM25 algorithm, which computes a weighted word overlap between the two strings.

Since the `InMemoryBM25Retriever` matches strings based on word overlap, it’s often used to find exact matches to names of persons or products, IDs, or well-defined error messages. The BM25 algorithm is very lightweight and simple. Nevertheless, it can be hard to beat with more complex embedding-based approaches on out-of-domain data.

In addition to the `query`, the `InMemoryBM25Retriever` accepts other optional parameters, including `top_k` (the maximum number of Documents to retrieve) and `filters` to narrow down the search space.
Some relevant parameters that impact the BM25 retrieval must be defined when the corresponding `InMemoryDocumentStore` is initialized: these include the specific BM25 algorithm and its parameters.

## Usage

### On its own

```python
from haystack import Document
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
			       Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
			       Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]
document_store.write_documents(documents=documents)

retriever = InMemoryBM25Retriever(document_store=document_store)
retriever.run(query="How many languages are spoken around the world today?")
```

### In a Pipeline

#### In a RAG Pipeline

Here's an example of the Retriever in a retrieval-augmented generation pipeline:

```python
import os
from haystack import Document
from haystack import Pipeline
from haystack.components.builders.answer_builder import AnswerBuilder
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

## Create a RAG query pipeline
prompt_template = """
    Given these documents, answer the question.\nDocuments:
    {% for doc in documents %}
        {{ doc.content }}
    {% endfor %}

    \nQuestion: {{question}}
    \nAnswer:
    """

os.environ["OPENAI_API_KEY"] = "sk-XXXXXX"

rag_pipeline = Pipeline()
rag_pipeline.add_component(instance=InMemoryBM25Retriever(document_store=InMemoryDocumentStore()), name="retriever")
rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
rag_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
rag_pipeline.connect("llm.replies", "answer_builder.replies")
rag_pipeline.connect("llm.metadata", "answer_builder.metadata")
rag_pipeline.connect("retriever", "answer_builder.documents")

## Draw the pipeline
rag_pipeline.draw("./rag_pipeline.png")

## Add Documents
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
			       Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
			       Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]
rag_pipeline.get_component("retriever").document_store.write_documents(documents)

## Run the pipeline
question = "How many languages are there?"
result = rag_pipeline.run(
            {
                "retriever": {"query": question},
                "prompt_builder": {"question": question},
                "answer_builder": {"query": question},
            }
        )
print(result['answer_builder']['answers'][0])
```

#### In a Document Search Pipeline

Here's how you can use this Retriever in a document search pipeline:

```python
from haystack import Document
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.pipeline import Pipeline

## Create components and a query pipeline
document_store = InMemoryDocumentStore()
retriever = InMemoryBM25Retriever(document_store=document_store)

pipeline = Pipeline()
pipeline.add_component(instance=retriever, name="retriever")

## Add Documents
documents = [Document(content="There are over 7,000 languages spoken around the world today."),
			       Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
			       Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]
document_store.write_documents(documents)

## Run the pipeline
result = pipeline.run(data={"retriever": {"query":"How many languages are there?"}})

print(result['retriever']['documents'][0])
```
